Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

container-encapsulate: make build_mapping_recurse significantly faster #4768

Merged
merged 2 commits into from
Mar 29, 2024

Conversation

tymlipari
Copy link
Contributor

@tymlipari tymlipari commented Jan 11, 2024

While toying around with building my own custom FCOS builds, I noticed that running cosa build container with a package set similar to Silverblue's resulted in ~2hr builds, the vast majority of which was in the "Building package mapping" task. After this change, the runtime on my build shrank to ~15 mins.

$ time cosa build container
Before

real    10m47.769s
user    52m14.763s
sys     46m38.546s

After

real    15m37.333s
user    2m38.751s
sys     0m14.410s

The speedup is accomplished by avoiding the need to query the rpmdb for every file. Instead the rpmdb is walked to build a cache of the files to providing packages, so that when the ostree filesystem is walked later it can just check the cache. The cache is structured similarly to rpm's internals, where paths are maintained as separate basename and dirname entries. Additionally, like rpm, the paths are considered equivalent if the dirnames resolve to the same path (rpm uses stat to compare inodes, this implementation resolves the symlinks). This results in output that is effectively equivalent to the previous implementation while being substantially faster.

To minimize memory overhead maintaining the file mapping, a simple string cache is also added.

Closes: #4880

Copy link

openshift-ci bot commented Jan 11, 2024

Hi @tymlipari. Thanks for your PR.

I'm waiting for a coreos member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@tymlipari
Copy link
Contributor Author

One note: I said "effectively" equivalent to call out a small delta I noticed when doing a test compose of a Silverblue 39 repo. Every file digest (I dumped state.content) matched the same as before except for 6 files that had previously been attributed to cups-client-1:2.4.7-5.fc39.x86_64:

/usr/bin/cancel
/usr/bin/lp
/usr/bin/lpq
/usr/bin/lpr
/usr/bin/lprm
/usr/bin/lpstat

These files only have *.cups matches in the rpm itself (rpm -q -l cups-client-1:2.4.7-5.fc39.x86_64). My guess is that when rpm does a stat it's somehow resolving something against the native filesystem that matches? I can dig deeper on this mismatch, but it's the only delta I observed building the latest Silverblue F39 branch.

@cgwalters
Copy link
Member

Thanks @tymlipari ! I agree this is an annoying paper cut. I only skimmed your code but I find myself wondering if we are missing some sort of potential much simpler change here to optimize things. As I understand things, the rpm database is providing an index for this...is there some sort of flag we need to pass?

One historical major foot-gun in lib rpm as I understand it is sometimes it does signature verification by default and one needs to do e.g.
rpmtsSetVSFlags (rpmdb_ts, _RPMVSF_NOSIGNATURES | _RPMVSF_NODIGESTS); as one can see in various bits of our code.

@tymlipari
Copy link
Contributor Author

tymlipari commented Jan 12, 2024

@cgwalters I'm not sure there is, at least not without needing to expose new functionality from within rpm.

In the RPM source code, the only place I see it build a full fingerprint cache of its own for the full package set is within rpmtsRun via rpmtsPrepare. But in this case we don't have an rpmts with any transactions since we're just querying, so this isn't useful.

So then I started digging through RPM to understand the existing implementation (rpmtsInitIterator(ts, RPMDBI_INSTFILENAME,...)) to see if there's an optimal way to get this info cheaply. Of the available database tags we can query for, RPMDBI_INSTFILENAME and RPMDBI_PROVIDENAME seem the most appropriate for what we're doing. (sidenote, I built main to only use RPMDBI_PROVIDENAME and it appears to have matched nothing when I run it).

Looking at how RPM handles RPMDBI_INSTFILENAME, it internally maps to the basenames table. If provided a search string, it'll use rpmdbFindByFile to do the lookup, otherwise it'll just return the full database set.

The current implementation uses the rpmFindByFile path which is problematic because when it executes fpLookup, it's going to inadvertantly stat the build system's fs (because nothing has done a chroot to the rpm root dir). I suspect this is the primary source of why building in cosa is so much slower than invoking rpm-ostree directly on my laptop - the container filesystem is likely not well optimized for this scenario. Not as problematic, but also not helpful for our scenario is that we can't retain the fingerprint cache across multiple queries. (ninja edit: oh, and when doing a file-based lookup, it's going to execute a database query for every file we have to check).

So an alternative is essentially what I've coded up here - take essentially the raw database contents and cache them locally to speed up the lookup. (ninja edit: This results in essentially a single database query, but means we have to hold more memory on our end). One caveat though is that it's still necessary to resolve the path against the OSTree repo fs contents, because several RPMs have paths in the database that don't match their real filesystem paths (e.g. rpm -ql kernel-modules-6.6.3-200.fc39.x86_64 lists its paths as being installed underneath /lib/modules rather than /usr/lib/modules) so it's necessary to resolve the symlinks when checking path equivalency. This is what I've implemented in RpmFileDb::try_resolve_real_fs_path. Without this, a not-insignificant number of paths don't match and thus get bucketed to "rpmostree-unpackaged-content".

Other approaches that could work, but would require I think either more buy-in from RPM or more risk from rpm-ostree:

  1. Within rpm - Attach a fingerprint cache to the rpmts that can be reused across match iterators. Also fix the fingerprint cache to not nakedly point at the root filesystem if we haven't done a chroot.

  2. Within rpm-ostree - when applying rpms to a tree, rewrite their filesystem paths in the resulting rpmdb to the real path, so that we can guarantee future lookups don't need to rely on symlink resolution. I'm not sure what the compat risk here would be, but I assume it's non-zero.

@cgwalters
Copy link
Member

Hmm...can we get the same effect here by forking off the target tree as a container and running librpm from inside there, talking to it over IPC?

@tymlipari
Copy link
Contributor Author

I believe that would only fix the issue of stating the host file system instead of the ostree filesystem. It wouldn't resolve the runtime complexity, and as observed when running via cosa, one of the container filesystems in play seems unoptimized for this kind of lookup pattern (see the amount of time spent in kernel space in the before run). I'd worry that forcing this operation to always run in a container would potentially bring the worse runtimes to all rpm-ostree users rather than likely just cosa users today.

Copy link
Member

@cgwalters cgwalters left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So you've obviously put some effort into this and it is clearly solving a real problem. To be honest I was not happy with the previous code at all. But, I also am not finding myself thinking that this new approach is an unambiguous improvement...the chance for new bugs seems somewhat high.

I'd really like to be able to efficiently backreference files from librpm without us carrying a new cache.

src/libpriv/rpmostree-refts.cxx Outdated Show resolved Hide resolved
@cgwalters
Copy link
Member

Again I'm not opposed to just merging this but let's discuss some more

@cgwalters cgwalters added area/baseimage-builds difficulty/medium medium complexity/difficutly issue container-native triaged This issue was triaged labels Feb 28, 2024
@cgwalters
Copy link
Member

(To pass CI you need to make clang-format)

To be very honest if you're likely to make more contributions later, I'm inclined to click commit and we can keep improving this more.

@AdamWill
Copy link

time to build an Onyx ostree container (in a mock chroot on my laptop) with this patch:

real	13m31.828s
user	7m58.593s
sys	2m58.602s

without it:

real	88m42.291s
user	34m54.632s
sys	50m33.717s

@cgwalters
Copy link
Member

To keep conversation in one place, moving this here:

<adamw> so, we (nirik and I) would quite like to fix the very very long ostree container build times for fedora. where do you stand on #4768 currently? are you minded to merge it? would you hate if we backported it?

It's not quite "backporting" if a PR isn't merged 😄

Well...yeah. I still have the feeling we should be able to figure out how to optimize the rpmdb lookup without a whole other mapping but...yes, it's unlikely I (or anyone else) will have bandwith in the near future to dive into this more, so...in the interest of forward progress I will take a look at just doing some cleanups on this and merging.

@cgwalters
Copy link
Member

@tymlipari can you allow me to force-push to your branch? There should be a checkbox on this PR

@@ -0,0 +1,43 @@

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs an SPDX header here

@@ -6094,16 +6124,6 @@ cache_branch_to_nevra (::rust::Str nevra) noexcept
rpmostreecxx$cxxbridge1$cache_branch_to_nevra (nevra, &return$.value);
return ::std::move (return$.value);
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason this code is moved? Maybe split this change into another commit?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was a result of me running ./ci/verify-cxx.sh, I can revert if you'd like though.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, then maybe let's split this change into its own commit?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is minor so optional.

@tymlipari
Copy link
Contributor Author

Apologies, I hadn't been able to get around to this as quickly as I hoped.

I've updated the PR to move all of the logic behind this into the Rust side of the codebase. Now, rather than walking the filesystem and asking rpm which package provides it, we walk each package's file set and look it up in the ostree filesystem. At runtime rather than directly building the content map, it builds two maps: a) checksum -> {path} and b) path -> {pkg}. After walking the packages, the filesystem is walked one last time to ensure there's always a checksum -> path mapping for a file in the tree. Any path without a path -> pkg entry is assumed to be unpackaged.

Overall, this is essentially the same algorithm as the previous iteration of this PR, but a bit more cleanly integrated with the existing codebase.

}

// Resolve our parent, then resolve ourselves as a direct child
if let Some(parent) = resolve_ostree_paths(path.parent().unwrap(), fsroot, cache) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor style nit, to avoid deep nesting below, we can do

let parent = if let Some(parent) = parent {
  parent
} else {
  return Ok(None);
}

It feels weird to mention the variable 4 times, but it does keep the code simpler in the end.

let link_target = child_info.symlink_target().unwrap();

// Due to a bug in OSTree's Gio.File implementation, we cannot
// just do `parent.resolve_relative_path` here as it doesn't correctly
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eek.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this was a frustrating find. I couldn't figure out why some paths weren't matching, and then after adding a bunch of debugging statements found it was producing paths like /usr/lib/something/../../usr/bin/something_else rather than resolving the .. paths along the way. The documentation for Gio.File says that resolve_relative_path should always produce an absolute path, which I guess it technically is here, but IMO it'd be more correct to resolve the real path. Especially because when you call query_info on the resulting file, it doesn't actually resolve the real file info (I was getting FileType::Unknown).

I'll try and see if I can cobble together a minified repro and file an issue over on the ostree repo. This logic I added does appear to handle these kinds of paths for us just fine when testing a Silverblue build, but still annoying to have to handle.

@cgwalters
Copy link
Member

/ok-to-test

@cgwalters
Copy link
Member

Why do you have a revert commit there in the middle? Can you squash the commits please?

@tymlipari tymlipari force-pushed the rpmdb_cache_with_treefs branch from 20762ff to 12d4a7d Compare March 28, 2024 23:18
@tymlipari
Copy link
Contributor Author

Why do you have a revert commit there in the middle? Can you squash the commits please?

Ah, I usually like to preserve the full history of a PR and squash when it's landed, but I can go ahead and squash it now instead.

@travier
Copy link
Member

travier commented Mar 29, 2024

Thanks a lot for working on this. This looks much more readable in Rust :)

Copy link
Member

@cgwalters cgwalters left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@cgwalters cgwalters merged commit aae3313 into coreos:main Mar 29, 2024
17 checks passed
@cgwalters
Copy link
Member

BTW for future changes can you include all the content from the PR description in commit messages? We try to have git log be useful here. I was able to do it when merging by editing the squashed commit, so no big deal.

Thanks again!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/baseimage-builds container-native difficulty/medium medium complexity/difficutly issue ok-to-test triaged This issue was triaged
Projects
None yet
Development

Successfully merging this pull request may close these issues.

build_mapping_recurse is very slow, causes ostree container builds to take a long time
4 participants